proximal policy optimization algorithm
Hybrid-Quantum Neural Architecture Search for The Proximal Policy Optimization Algorithm
Quantum machine learning (QML) has emerged in recent years as a promising approach to advance classical machine learning, considering that they can evaluate classically intractable functions [1] since they have properties that the classical computational models don't have access to like the exponential Hilbert space, entanglement, and the parallelistic nature of quantum computation in the presence of superposition. But, Since we are still in the NISQ era, with limited quantum computers, many studies have discussed the use of hybrid architectures in the hope that they can get an advantage from the currently existing quantum computers or if they could find a way to distribute the work harmonically and efficiently between the classical and the quantum part, each part with what they are good at. An important Distinguish that had to be made here is that when we are not considering variational quantum circuits as hybrid classical-quantum models, since they use the classical computer only to train and optimize the quantum circuit but they don't interfere with the actual learning and the inference of the model, instead, we treat the VQC itself as a layer, a quantum layer, in the neural network of the hybrid model along with classical layers with artificial neurons. In this paper, we apply the proximal policy optimization (PPO) algorithm to the classical CartPole environment problem to ascertain whether the quantum-enhanced PPO algorithm gives any advantage over the classical version. In particular, we use a genetic algorithm-esque version of quantum PPO to train the system. This paper is divided as follows: In Sec.2, we examine analogous studies to the experiments we have performed in order to gauge what has been done, and how we can improve upon it. In Sec.3, we provide a brief synopsis of the operation of the PPO algorithm.
Learning Curricula in Open-Ended Worlds
Deep reinforcement learning (RL) provides powerful methods for training optimal sequential decision-making agents. As collecting real-world interactions can entail additional costs and safety risks, the common paradigm of sim2real conducts training in a simulator, followed by real-world deployment. Unfortunately, RL agents easily overfit to the choice of simulated training environments, and worse still, learning ends when the agent masters the specific set of simulated environments. In contrast, the real world is highly open-ended, featuring endlessly evolving environments and challenges, making such RL approaches unsuitable. Simply randomizing over simulated environments is insufficient, as it requires making arbitrary distributional assumptions and can be combinatorially less likely to sample specific environment instances that are useful for learning. An ideal learning process should automatically adapt the training environment to maximize the learning potential of the agent over an open-ended task space that matches or surpasses the complexity of the real world. This thesis develops a class of methods called Unsupervised Environment Design (UED), which aim to produce such open-ended processes. Given an environment design space, UED automatically generates an infinite sequence or curriculum of training environments at the frontier of the learning agent's capabilities. Through extensive empirical studies and theoretical arguments founded on minimax-regret decision theory and game theory, the findings in this thesis show that UED autocurricula can produce RL agents exhibiting significantly improved robustness and generalization to previously unseen environment instances. Such autocurricula are promising paths toward open-ended learning systems that achieve more general intelligence by continually generating and mastering additional challenges of their own design.
- North America > United States > New York > New York County > New York City (0.27)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.13)
- North America > United States > California > San Francisco County > San Francisco (0.13)
- (41 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Research Report > Experimental Study (0.67)
- Leisure & Entertainment > Sports > Motorsports > Formula One (1.00)
- Energy (0.92)
- Leisure & Entertainment > Games > Computer Games (0.92)
- Education > Curriculum (0.82)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.92)
Mission schedule of agile satellites based on Proximal Policy Optimization Algorithm
Mission schedule of satellites is an important part of space operation nowadays, since the number and types of satellites in orbit are increasing tremendously and their corresponding tasks are also becoming more and more complicated. In this paper, a mission schedule model combined with Proximal Policy Optimization Algorithm(PPO) is proposed. Different from the traditional heuristic planning method, this paper incorporate reinforcement learning algorithms into it and find a new way to describe the problem. Several constraints including data download are considered in this paper.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.93)